<<<<<<< Updated upstream ids-api-guide-python-2

Accessing International Debt Statistics (IDS) through World Bank Data API

Part 2 - Get and explore data

Python 3

This is second part of a two-part series on accessing the International Debt Statistics (IDS) database through the World Bank Data API. In Part 1, we queried the World Bank Data API to retrieve indicator names and location codes. In this guide, we will use that information to explore the regional trends of long-term external debt stocks from the IDS database.

The following code in this guide will show step-by-step how to:

  1. Setup up your environment with the needed packages
  2. Input your data specifications (as selected in Part 1)
  3. Use the World Bank Data API call to return the specified data
  4. Explore the data through basic descriptive analysis and create a pretty chart.

1. Setup

To start, make sure you have the following packages installed on your machine. If you aren't familiar with how to install a Python package, visit each of the linked packages below for instructions.

Then, open up your preferred mode of writing Python. This could be in a Jupyter Notebook using Jupyter Lab, using a code editor (like Atom or Visual Studio) + command line, or just from the command line. Now follow the rest of the steps below to retreive and analyze the World Bank data.

In [25]:
# Importing packages
import pandas as pd
import numpy as np
import datetime
import wbdata
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook" # use "pio.renderers" to see the default renderer

2. Data Specifications

You will specify the data you want to explore using the following parameters:

  • Indicator(s): The indicator code(s) for the data
  • Location(s): The location code(s) for the countries, regions, or income level categories
  • Time: Years

Indicator(s)

In this guide, we will be looking at "long-term external debt stock" from the IDS database. To find the indicator for the data in which you're interested, you can either explore the World Bank data catalog or use an API query as outlined in Part 1 of this series. The IDS indicators are also conveniently stored as a spreadsheet (LINK NEEDED) in this repo.

In [3]:
# Selecting the indictor
indicatorSelection = {"DT.DOD.DLXF.CD":"ExternalDebtStock"}

The text that follows the indicator code (in this case, "ExternalDebtStock") should be a description that helps that you correctly identify the indicator. To call more than one indicator, add more indicator names and descriptions to the dictionary.

Location(s)

To select a location by country, region, or income level category you will need to know its 2 or 3 letter code. To figure out what this code is, you can either use an API query as outlined in Part 1 of this series or use the convenient location-codes csv* in this repo.

We will select regional aggregates (these exclude high-income countries):

  • ECA: Europe & Central Asia
  • SSA: Sub-Saharan Africa
  • SAS: South Asia
  • LAC: Latin America & the Caribbean
  • MNA: Middle East & North Africa
  • EAP: East Asia & Pacific
*The location-codes csv was created using the API query: http://api.worldbank.org/v2/sources/2/country/data
In [4]:
# Select the countries or regions
locationSelection = ["ECA","SSA","SAS","LAC","MNA","EAP"]

Time

Here you will select the time frame for your data series. The format for the date is year, month, day. We are selecting data from 2008 to 2017.

In [5]:
# Selecting the time frame
timeSelection = (datetime.datetime(2008, 1, 1), datetime.datetime(2017, 12, 31))

3. API Call

In this step, we will retrieve the data using the World Bank Data API call. The package "wbdata" can request information from the World Bank database as a dictionary containing full metadata or as a pandas DataFrame. In this example, we will request the data, with the parameters outlined above, as a pandas DataFrame.

In [6]:
# Making the API call and assigning the resulting DataFrame to "EXD"
EXD = wbdata.get_dataframe(indicatorSelection, 
                            country = locationSelection, 
                            data_date = timeSelection, 
                            convert_date = False)

If you want a quick preview of your freshly retrieved DataFrame, you can print the first 5 lines

In [7]:
# Print the first 5 lines of the DataFrame
print(EXD.head())
                                                  ExternalDebtStock
country                                     date                   
East Asia & Pacific (excluding high income) 2017       1.263986e+12
                                            2016       1.158415e+12
                                            2015       1.035285e+12
                                            2014       1.029315e+12
                                            2013       8.397770e+11

4. Explore the data!

Congratulations! At this point you should have the long-term external debt stock for regions (excluding high-income economies) from 2008 - 2017 all in a DataFrame called "EXD."

Now we can do the following:

  • Data Cleaning: Clean up the format to use in a table or populate a visualization
  • Visualization: Create a simple chart

Data Cleaning

As you saw in the preview of the data in section 3, the DataFrame's format needs to be cleaned up. We want to reshape the data. This will get it ready to present in a table or in a visualization.

In [8]:
# Reshape the data
EXDreshaped = pd.DataFrame(EXD.to_records())

The data for the long-term external debt stock is currently in units. To improve a table's or chart's readability, convert the units to billions and round the number to 0 decimal places. To do this, create a function called "formatNum" that you can then run on your DataFrame.

In [9]:
# Creating a function that will change units to billions and round to 0 decimal point
def formatNum(x):
    y = x/1000000000
    z = round(y)
    return(z)

# Running the function on the desired data column
EXDreshaped.ExternalDebtStock = formatNum(EXDreshaped.ExternalDebtStock)

These next two sections of code will clean up the naming of headers and regions. First, it will rename the column headers. Second, it will remove the redundant "(excluding high income)" from the region names. We can instead include that information in the title of the legend.

In [11]:
# Renaming column headers
EXDclean = EXDreshaped.rename(index=str, columns={
    "date":"Year",
    "country":"Region",
})
In [12]:
# Remove the "(excluding high income)" from each of the region names
EXDclean["Region"] = EXDclean["Region"].str.replace("excluding high income","").str.replace(")","").str.replace("(","")

Now our data should be ready to present in a table or visualize in a chart. Let's take a look at the first five lines again so we can compare the cleaned up data to the raw output in section 3.

In [13]:
print(EXDclean.head())
                 Region  Year  ExternalDebtStock
0  East Asia & Pacific   2017             1264.0
1  East Asia & Pacific   2016             1158.0
2  East Asia & Pacific   2015             1035.0
3  East Asia & Pacific   2014             1029.0
4  East Asia & Pacific   2013              840.0

Data Visualization

Now use the package "plotly" to create a basic line graph, similar to one from the blog post on the launch of IDS 2019.

In [26]:
# Defining the data source
source = EXDclean

# Creating the chart
chart = px.line(EXDclean, 
                x="Year",
                y="ExternalDebtStock",
                color="Region",
                title="Long-term External Debt Stock (USD billion)")
chart.update_layout(
                plot_bgcolor="white")

# Displaying the chart
chart
======= ids-api-guide-python-2

Accessing International Debt Statistics (IDS) through World Bank Data API

Part 2 - Get and explore data

Python 3

This is second part of a two-part series on accessing the International Debt Statistics (IDS) database through the World Bank Data API. In Part 1, we queried the World Bank Data API to retrieve indicator names and location codes. In this guide, we will use that information to explore the regional trends of long-term external debt stocks from the IDS database.

The following code in this guide will show step-by-step how to:

  1. Setup up your environment with the needed packages
  2. Input your data specifications (as selected in Part 1)
  3. Use the World Bank Data API call to return the specified data
  4. Explore the data through basic descriptive analysis and create a pretty chart.

1. Setup

To start, make sure you have the following packages installed on your machine. If you aren't familiar with how to install a Python package, visit each of the linked packages below for instructions.

Then, open up your preferred mode of writing Python. This could be in a Jupyter Notebook using Jupyter Lab, using a code editor (like Atom or Visual Studio) + command line, or just from the command line. Now follow the rest of the steps below to retreive and analyze the World Bank data.

In [1]:
# Importing packages
import pandas as pd
import numpy as np
import datetime
import wbdata
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "notebook" # use "pio.renderers" to see the default renderer

2. Data Specifications

You will specify the data you want to explore using the following parameters:

  • Indicator(s): The indicator code(s) for the data
  • Location(s): The location code(s) for the countries, regions, or income level categories
  • Time: Years

Indicator(s)

In this guide, we will be looking at "long-term external debt stock" from the IDS database. To find the indicator for the data in which you're interested, you can either explore the World Bank data catalog or use an API query as outlined in Part 1 of this series. The IDS indicators are also conveniently stored as a spreadsheet (LINK NEEDED) in this repo.

In [2]:
# Selecting the indictor
indicatorSelection = {"DT.DOD.DLXF.CD":"ExternalDebtStock"}

The text that follows the indicator code (in this case, "ExternalDebtStock") should be a description that helps that you correctly identify the indicator. To call more than one indicator, add more indicator names and descriptions to the dictionary.

Location(s)

To select a location by country, region, or income level category you will need to know its 2 or 3 letter code. To figure out what this code is, you can either use an API query as outlined in Part 1 of this series or use the convenient location-codes csv* in this repo.

We will select regional aggregates (these exclude high-income countries):

  • ECA: Europe & Central Asia
  • SSA: Sub-Saharan Africa
  • SAS: South Asia
  • LAC: Latin America & the Caribbean
  • MNA: Middle East & North Africa
  • EAP: East Asia & Pacific
*The location-codes csv was created using the API query: http://api.worldbank.org/v2/sources/2/country/data
In [3]:
# Select the countries or regions
locationSelection = ["ECA","SSA","SAS","LAC","MNA","EAP"]

Time

Here you will select the time frame for your data series. The format for the date is year, month, day. We are selecting data from 2008 to 2018.

In [4]:
# Selecting the time frame
timeSelection = (datetime.datetime(2008, 1, 1), datetime.datetime(2018, 12, 31))

3. API Call

In this step, we will retrieve the data using the World Bank Data API call. The package "wbdata" can request information from the World Bank database as a dictionary containing full metadata or as a pandas DataFrame. In this example, we will request the data, with the parameters outlined above, as a pandas DataFrame.

In [5]:
# Making the API call and assigning the resulting DataFrame to "EXD"
EXD = wbdata.get_dataframe(indicatorSelection, 
                            country = locationSelection, 
                            data_date = timeSelection, 
                            convert_date = False)

If you want a quick preview of your freshly retrieved DataFrame, you can print the first 5 lines

In [6]:
# Print the first 5 lines of the DataFrame
print(EXD.head())
                                                  ExternalDebtStock
country                                     date                   
East Asia & Pacific (excluding high income) 2018       1.391850e+12
                                            2017       1.285327e+12
                                            2016       1.172696e+12
                                            2015       1.036149e+12
                                            2014       1.040363e+12

4. Explore the data!

Congratulations! At this point you should have the long-term external debt stock for regions (excluding high-income economies) from 2008 - 2017 all in a DataFrame called "EXD."

Now we can do the following:

  • Data Cleaning: Clean up the format to use in a table or populate a visualization
  • Visualization: Create a simple chart

Data Cleaning

As you saw in the preview of the data in section 3, the DataFrame's format needs to be cleaned up. We want to reshape the data. This will get it ready to present in a table or in a visualization.

In [7]:
# Reshape the data
EXDreshaped = pd.DataFrame(EXD.to_records())

The data for the long-term external debt stock is currently in units. To improve a table's or chart's readability, convert the units to billions and round the number to 0 decimal places. To do this, create a function called "formatNum" that you can then run on your DataFrame.

In [8]:
# Creating a function that will change units to billions and round to 0 decimal point
def formatNum(x):
    y = x/1000000000
    z = round(y)
    return(z)

# Running the function on the desired data column
EXDreshaped.ExternalDebtStock = formatNum(EXDreshaped.ExternalDebtStock)

These next two sections of code will clean up the naming of headers and regions. First, it will rename the column headers. Second, it will remove the redundant "(excluding high income)" from the region names. We can instead include that information in the title of the legend.

In [9]:
# Renaming column headers
EXDclean = EXDreshaped.rename(index=str, columns={
    "date":"Year",
    "country":"Region",
})
In [10]:
# Remove the "(excluding high income)" from each of the region names
EXDclean["Region"] = EXDclean["Region"].str.replace("excluding high income","").str.replace(")","").str.replace("(","")

Now our data should be ready to present in a table or visualize in a chart. Let's take a look at the first five lines again so we can compare the cleaned up data to the raw output in section 3.

In [11]:
print(EXDclean.head())
                 Region  Year  ExternalDebtStock
0  East Asia & Pacific   2018             1392.0
1  East Asia & Pacific   2017             1285.0
2  East Asia & Pacific   2016             1173.0
3  East Asia & Pacific   2015             1036.0
4  East Asia & Pacific   2014             1040.0

Data Visualization

Now use the package "plotly" to create a basic line graph, similar to one from the blog post on the launch of IDS 2019.

In [12]:
# Defining the data source
source = EXDclean

# Creating the chart
chart = px.line(EXDclean, 
                x="Year",
                y="ExternalDebtStock",
                color="Region",
                title="Long-term External Debt Stock (USD billion)")
chart.update_layout(
                plot_bgcolor="white")

# Displaying the chart
chart
>>>>>>> Stashed changes